APPARATUS AND METHOD FOR ESTIMATING CARDINALITY WHEN DATA SKEW IS 

PRESENT 
ABDOETAL 
DOCKET NO ROC920020192US1 



1/7 



100 



J 




110 



160 



L 



121 
122 
123 
124 
125 

126 
127 



Main Memory 



Data 

Operating System 

Database 
Database Manager 

Database Query 
DB Query Optimizer 
Cardinality Estimator 



120 

j 




FIG. 1 



APPARATUS AND METHOD FOR ESTIMATING CARDINALITY WHEN DATA SKEW IS 

PRESENT 
ABDO ET AL. 
DOCKET NO. ROC920020192US1 

2/7 



Select count(*) from T where T.A < b group by T.A 
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Ca' = Ca(1-(1-1/Ca) x ) 
where 

X = Number of Rows in Intermediate Dataset 
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Ca' = P + M(1-(1-1/M) Y ) 
where 
M = Ca - (P+Q) 
Y = X - Fi 

X = number of rows in intermediate dataset 
P = number of distinct skewed values in X 
Q = number of distinct skewed values not in X 
Fi = sum of frequencies for all skewed values 
in X above predetermined threshold 
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Y = (1,000,000*0.333) - (150,000 + 100,000) 
= 83,000 
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M = 100,000 -(2+1) 
= 99,997 
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Ca' = 2 + 99,997(1 -(1-1/99,997) 83000 ) 
= 56,397 
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Ca' = min(1 00,000 , 333,000) 
= 100,000 
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